Grammatical annotation of the Portuguese C-ORAL Corpus

نویسنده

  • Eckhard Bick
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammatical Annotation of Historical Portuguese: Generating a Corpus-Based Diachronic Dictionary

In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our met...

متن کامل

The C-ORAL-BRASIL I: Reference Corpus for Spoken Brazilian Portuguese

C-ORAL-BRASIL I is a Brazilian Portuguese spontaneous speech corpus compiled following the same architecture adopted by the C-ORAL-ROM resource. The main goal is the documentation of the diaphasic and diastratic variations in Brazilian Portuguese. The diatopic variety represented is that of the metropolitan area of Belo Horizonte, capital city of Minas Gerais. Even though it was not a primary g...

متن کامل

The annotation of the C-ORAL-BRASIL spoken corpus using an adaptation of the Palavras Parser

This article describes the morphosyntactic annotation of the C-ORAL-BRASIL speech corpus, using an adapted version of the Palavras parser. In order to achieve compatibility with annotation rules designed for standard written Portuguese, transcribed words were orthographically normalized, and the parsing lexicon augmented with speech-specific material, phonetically spelled abbreviations etc. Usi...

متن کامل

Challenges in modality annotation in a Brazilian Portuguese Spontaneous Speech Corpus

This short paper introduces the first notes about a modality annotation system that is under development for a spontaneous speech Brazilian Portuguese corpus (C-ORALBRASIL). We indicate our methodological decisions, the points which seem to be well resolved and two issues for further discussion and investigation.

متن کامل

When CORDIAL Becomes Friendly: Endowing the CORDIAL Corpus with a Syntactic Annotation Layer

This paper reports on the syntactic annotation of a previously compiled and tagged corpus of European Portuguese (EP) dialects – The Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN). The parsed version of CORDIAL-SIN is intended to be a more efficient resource for the purpose of studying dialect syntax by allowing automated searches for various syntactic constructions of interest. To...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011